Digital Speech within 125 Hz Band
by Mike Lebo  N6IEF

Digital speech within 125 Hz bandwidth


Objective

To modify and write code needed to convert analog voice into narrow band digital modulation.

Why do this?

The bandwidth of voice is about 2400 Hz. When speech could be reduced to 125 Hz, the gain would be 12.8 dB (19.2X). Processing gain by a computer is cost free. This project receives weak signals 9 dB  below SSB (Single Side Band) noise floor of the radio.

NOTE: For those who have not worked with digital modulation, there is a prequel at the end which should be read first.

Generating of the transmit phonemes

A phoneme is to speech as the alphabet is to reading or writing. Since each person sounds different from another, it is clear that the computer must recognize the unique phonemes used by only that person while operating this software. The software must be able to teach itself the phonemes so that it can recognize that person's voice, which is done by reading words shown on the monitor into the microphone while holding down the space bar of the keyboard.

The code used

The 45 phonemes are represented by a code made up of 1’s and 0’s. The code is similar to a court recorder typing out steno, which can be read back. All code groups start with 1 and end with two or more 0’s. Since phonemes are grouped by the shape of the mouth, tongue and lips, the codes used in one group of phonemes should be as different as possible from other groups. Some phonemes are longer than others and they should have a longer code. Of the 53 codes, only 45 are used with eight as spares. This code is exactly the same Varicode used in PSK-31, (Phase Shift Keying with 31 Hz bandwidth).

100, 1100, 10100, 11100, 101100, 111100, 1010100, 1011100, 1101100, 1110100, 1111100, 10101100, 10110100, 10111100, 11010100, 11011100, 11101100, 11110100, 11111100, 101010100, 101011100, 101101100, 101110100, 101111100, 110101100, 110110100, 110111100, 111010100, 111011100, 111101100, 111110100, 111111100, 1010101100, 1010110100, 1010111100, 1011010100, 1011011100, 1011101100, 1011110100, 1011111100, 1101010100, 1101011100, 1101101100, 1101110100, 1101111100, 1110101100, 1110110100, 1110111100, 1111010100, 1111011100, 1111101100, 1111110100, 1111111100

As shown, the code is the fastest speed for each phoneme. By adding one or more extra 0’s to any code, the length of that phoneme is stretched by increments of 1/125 of a second. This is very important because voice speed is constantly changing. The original 45 phonemes are expanded to many new phonemes.

The software summary

Voice received through the computer’s microphone is converted into numbers, amplified to a constant level, converted into 16 bands of frequency, cut into three parallel 24 mS sections of time, compared in a two-stage process to a library of 45 phonemes that have been made by the operator of the radio, converted to a digital code, stretched to fit the operator’s real speech, and sent to the radio in a modified QPSK-125 format (Quadrature Phase Shift Keying with 125 Hz bandwidth) to be transmitted.

The transmit sequence

The transmit sequence starts with the pressing of the space bar on the computer keyboard and continues until the space bar is released. The computer speakers' D/A (Digital to Analog) converter is forced to zero. The AGC (Automatic Gain Control) is un-frozen.

The 400 mS synchronizing alternating series of ones and zeros is sent to the transmit section of the PSK program. This 125 Hz BPSK code is used by the other computers’ receiver section of the PSK program to re-synchronize the 125 Hz clock. This insures that the receiver section of the PSK program is sampled in the middle of each code digit and is not sampled during the transitions.

The sampling 66,000 Hz clock starts the A/D (Analog to Digital) converter from the microphone input of the computer. Each clock cycle makes the A/D output a 16-digit signed number. Each number goes to the AGC (Automatic Gain Control) array and the AGC level adjustor.

The AGC is used to amplify the weak signal from the microphone to about 90% of the maximum value for the 16-digit signed number. This is done by TBD (To Be Determined) method. It will use the normal fast attack and slow decay, but it will be frozen when the space bar is not pressed.

Some of the numbers from the AGC level adjustor go to 32 FIR (Finite Impulse Response) low-pass filters. A FIR low-pass filter has a frequency F and a number of taps N and a sampling rate. The problem with filters is the time difference, DPD (Differential Propagation Delay), between the outputs of high frequency filters and the outputs of low frequency filters with the same input to both. The 17 F frequencies for the FIR filters are 8000 Hz, 6083 Hz, 4625 Hz, 3517 Hz, 2674 Hz, 2033 Hz, 1546 Hz, 1176 Hz, 894 Hz, 680 Hz, 517 Hz, 393 Hz, 299 Hz, 227 Hz, 173 Hz, 131 Hz, and 100 Hz.

A first order attempt to solve the DPD problem is to use different sampling frequencies for each group of two FIR filters. The numbers from the A/D are at a 66,000 Hz rate. When every fourth number is used, the new sampling rate is 16,500 Hz, or 66,000 Hz divided by 4 is 16,500 Hz. The 16 divide-by numbers are 4, 5, 7, 9, 12, 16, 21, 28, 36, 48, 63, 82, 110, 145, 190, and 251.

For example, the divided-by-4 sampling rate is used by the two highest frequency FIR low-pass filters, 8000 Hz and 6083 Hz. Both FIR low-pass filters need to have the same number of taps N to insure that their output numbers are available at the same time, or zero DPD. By subtracting the output numbers from these two FIR low-pass filters, new numbers are created at the same sampling rate. These numbers are approximately the instantaneous amplitude of the sound between the two frequencies. In the same way the other numbers are made by two FIR low-pass filters for each of the other 15 frequency bands, with each associated sampling rate. NOTE: Each set of two FIR low-pass filters has the same sampling rate, and taps N, and their DPD is zero, so their output numbers can be subtracted.

The DPD between frequency bands is not zero, but this doesn’t matter because the numbers between frequency bands are never used together.

This complicated process is being done to change the time-amplitude energy of voice into the time-frequency energy of speech.

Some people say that there are 44 phonemes and one extra phoneme for no sound. Dividing the A/D sample clock rate of 66,000 Hz by 1584 makes the phoneme sample interval. This interval is 24 mS. After the start of the phoneme sample interval, the absolute values of the next 12 numbers from each of the 16 frequency bands are examined for the largest value. This is called the peak search process. Just before the end of the interval, say at count 1583 of 1584, the 16 peak numbers are put into the phoneme sample array. The phoneme sample array can be visualized as a blue transparency bar-graph with 16 vertical columns, but it actually is a 16 by 1 array of numbers. This process re-synchronizes the DPD problem to the original 66,000 Hz sample clock of the microphone input D/A.

In order not to miss a phoneme, the above procedure is repeated in parallel, two other times by starting at counts 528 and 1056 from the original 1 to 1584. This insures a new phoneme sample array every 8 mS. The 24 mS time interval is used to detect each of the 45 phonemes, even when the phoneme lasts longer. To reduce the chances of receiving part of one phoneme and part of another phoneme, a new set of 16 peak numbers is started every 528 numbers or 8 mS. Overlapping numbers insure that a phoneme is not missed.

One of three parallel phoneme comparators takes its phoneme sample array and compares it to one of 45 arrays of 16 numbers from the phoneme library, visualized as a yellow transparency bar-graph. By subtracting one array from the other array, visualized as overlapping the yellow and the blue transparencies, the differences are visualized as blue and yellow and the common part of the bar-graph is visualized as green. To amplify these 16 differences, they are multiplied by themselves to make them all positive numbers and these 16 positive numbers are added together to make the single error number for that comparison. In the same way, the next array of 16 numbers from the phoneme library is subtracted from the original phoneme sample array until all 45 arrays from the phoneme library are used. The phoneme code for the three smallest error numbers of the 45 possible error numbers is sent to the guesser along with their error numbers and code sizes from the phoneme library. Although this process takes some time, the output rate should be the same as the input rate of 24 mS. Since there are three peak detectors with three comparators staggered 8 mS apart, a phoneme code with its error number and code size is sent into the guesser every 8 mS. The code size is a number from three to ten, which is the number of ones and zeros in that phoneme code.

The guesser is used to determine what code should be sent to the output Q. The guesser is like a Q with three levels. Three phoneme codes and their error numbers enter the back of the guesser and work their way down to the front of the guesser. So there are always nine phoneme codes in the guesser. Whenever three codes are entered, three other codes are removed. When there are three of the same phoneme codes in the guesser, the error number of that phoneme code in the front of the guesser is divided by three. When there are two of the same phoneme codes in the guesser, the error number of that phoneme code in the front of the guesser is divided by two. After the divides, the phoneme code and the code size of the smallest error number of the three in the front of the guesser is sent to the output Q. This happens every 8 mS.

The output Q is a buffer that is used to fix problems that happen when one phoneme transitions to another phoneme in our speech. The output Q is used to sort the phoneme codes into groups, like sorting cards into suits. When the phoneme code sent to the back of the output Q is the same as any of the two previous phoneme codes in the output Q, the new phoneme code is moved forward to that same phoneme code group.

One phoneme code is removed from the front of the output Q as each digit of the phoneme code is sent to the transmit part of the PSK program. But before a new phoneme code group is sent to the transmit part of the PSK program, the number of phoneme codes in that group is checked to see that they are more than the minimum number for that code size. When they are less than the minimum number, the group is removed from the output Q.

An extra zero is sent to the transmit part of the PSK program as each extra phoneme code beyond the phoneme code size is removed from the output Q. An example would be the phoneme code of 10100, which is different from 10100000 because the sound of the second code last 3/125 of a second longer. Although there only 45 fundamental phoneme codes, there are hundreds of extensions. No extra zeros are sent to the special phoneme code of 100, but the code could repeat when needed.

When the output Q does not contain enough of the phoneme codes, each digit of the code is still sent to the transmit part of the PSK program, but the output Q does not move to the next phoneme code until all the digits of that code are sent. This part is TBD, but there must be a way to make the fill in code slots equal to the slots removed by the non minimum number codes.

Code sizes (Minimum number) are 3 (2), 4 (2), 5 (3), 6 (4), 7 (4), 8 (5), 9 (6) and 10 (7).

At the start of each transmission sequence, when the space bar on the computer keyboard is pressed, the guesser and output Q are filled with a quantity of the code 100, the special code for no-sound, because the computer takes some time for the numbers from the microphone A/D to be processed. At the start of a transmission, these leading 100 special codes are removed from the output Q and the ones and zeros of the rest of the real phoneme codes are sent to the transmit part of the PSK program.

Each digit of the phoneme code is sent serially at a 8 mS rate. This is the same rate at which the error numbers enter the guesser and the same rate at which the audio code modulates the radio transmitter.

At the end of each transmission, the space bar on the computer keyboard is released, all 100 special codes on the back of the output Q are removed and the special end code of 1111111111 is sent to the output Q and then to the transmit part of the PSK program. This sets the squelch of the other computers' receiver section of the PSK program.

With today’s computers having 3 GHz clocks and quad processors, twelve billion operations can be done every second. Speech recognition software in 2004 did not have this computer power and did not work very well. In the event the guesser makes a mistake, our brains deal with the occasional anomalous sound from the computer's speaker. Words may sound mispronounced, but we should know what they mean.

This transmit sequence may look like speech recognition software, but it has two differences. First, speech-to-text software requires the ability to handle spelling and meaning. An example would be the homonyms “to,” “two,” and “too.” Most of the code for speech recognition software would not be used. Second, speech recognition software has no time limit from sound to text. The transmit sequence of this software requires a minimum fixed time delay.

The receiver sequence

The receiver sequence starts with the release of the space bar on the computer keyboard and continues until the space bar is pressed. The microphone A/D is forced to zero. The guesser is not allowed to send more codes to the output Q.

After the 400 mS BPSK signal re-synchronizes the 125 Hz clock and releases the squelch, the ones and zeros coming from the receive part of the PSK program are sent to the phoneme comparator. The first one after two consecutive zeros starts a new phoneme code. The first code of ones and zeros assumes a 100 special code for no-sound has been detected. Since the phoneme code is sent serially, each digit goes to the phoneme code library one at a time where half of the library is eliminated with each digit after the first one. When the next digit is received, half of the half of the library is eliminated and so on until two consecutive zeros are detected. That is when the phoneme code is found. Then four phoneme arrays (audio clips) are found from the phoneme library. The first phoneme array is called the main array. It is ((the code size – 2) X 8 mS) long and has ((the code size – 2) X 528) numbers. The next phoneme array is called the zero array. It is 8 mS long and has 528 numbers. The next phoneme array is called the third array. It is the same as the zero array, but each of the numbers is divided by three. The last phoneme array is called the two-thirds array. It is the same as the third array, but each of the numbers is multiplied by two.

Normally a .wav file would be used for an audio clip, but that won’t work for 8 mS to 64 mS sound clips with 528 to 4224 numbers in each array. A new way to send the numbers to the speaker D/A will be made by a TBD method.

When the first two consecutive zeros of the present phoneme code are detected, each of the numbers in the present third array and each of the numbers in the previous two-thirds array are added in the first blender array. Then each of the numbers in the present two-thirds array and each of the numbers in the previous third array are added in the second blender array. Then the first blender array is sent to the sound card D/A buffer of the computer, followed by second blender array, followed by the main array of the present phoneme code. When another zero is detected after the first two zeros of the present phoneme code, the zero array of the present phoneme code is sent to the sound card D/A buffer for each extra zero.

The two 8 mS blender arrays are used to ease the transition from one phoneme to the next phoneme when played on the computer's speaker.

Then the next detected phoneme code is sent to the sound card D/A buffer and so forth. The sampling rate for the D/A is 66,000 Hz because 66,000 Hz was used to make the original phoneme code arrays in the look-up library. Although this example uses one set of phoneme voice clips for each phoneme code, the computer contains 11 other sets of phoneme voice clips, which can be selected by the operator pressing one of the F1 through F12 keys on the computer keyboard.

Making the operator’s phonemes sequence

Before doing the transmit sequence the phoneme library arrays must be known. This is a one-time only event, which must be done before the computer is connected to the radio. The operator says words into the microphone that are displayed on the computer monitor, while holding down the space bar on the keyboard.

The same microphone and A/D converter from the transmit section are used to make the numbers of the phoneme, which are then applied to the same FIR filters. After the start of the phoneme sample interval, the absolute value of the next 12 numbers from each of the 16 frequency bands are examined for the largest value. This is the same peak search process as in the transmit section. Just before the end of the interval, say at count 1543 of 1584, the 16 peak numbers are put into the phoneme sample array. The phoneme sample array becomes the library value for that phoneme. But this library value might be wrong. So the word should be repeated and averaged. When the change in the average is small, then there is enough information to use the array. This needs to be done for all 44 phonemes. The no-sound phoneme is the only exception. No testing is required. Any DPD problems are exactly the same in both the transmit sequence and the making operator’s phonemes sequence, which negate each other.

Making the library sequence at the distribution

The main, zero, third and two-third arrays used in the library of the receive section needs to be made. Twelve different people should record the 44 phonemes. This will be done in the lab with audio spectrum analyzers and high tech computers. Each of the numbers in an array must start and end at zero crossing with a positive slope. This is to prevent discontinuities when any two sets of numbers are connected then played into the computer speaker. After the main phoneme arrays are made, the zero arrays are made. This could be done in the lab by changing individual numbers in the zero array for best sound when connected and played on the computer’s speaker. The third array and the two-thirds array are easy to do.

Conclusion

At this time I have not succeeded in learning programing. Without help modifying and writing code, this project ends at this paper. If you have not made up your mind that this not work, please contact me at [email protected]/a> or 858-278-5851 or Skype (Michael E. Lebo).

Supplement to digital speech within 125 Hz bandwidth
Why not use voice to text software and text to speech software that is available now?
Timing     Lets assume the above is working perfectly. Lets also assume one speaker of a stereo is the original voice and the other is the sound from the speaker of the receivers computer. Lets also assume the original voice is delayed the same amount as the all the processes. In a three minute speech the sound from the stereo speakers will not be the same. The best example of this would be the varying sound of a record played with a hole drilled off center. The timing of the digital speech within 125 Hz bandwidth software is accurate to within plus-or-minus 4 mS at any time no mater how long the original live speech last.
Non-text word that are commonly spoken.   Al Capp made an art-form of spoken words that are not easily written or read in his comic strip "Li'l Abner". How many times have you used the word waja? When you can't understand someone, you say "Waja say?" (What did you say?) Voice to text software and text to speech software can't handle this, but the digital speech within 125 Hz bandwidth software would not recognize anything unusual.
Shannon - Hartley Capacity theorem
Shannon's Law (actually Shannon - Hartley Capacity theorem) relating noise, bandwidth, signal power, and capacity can be found in any good communication text. In common terms the bandwidth of voice can not be compressed. I believe that voice is made of speech and pitch and volume and timing. Digital voice is high fidelity. Digital speech is understanding the meaning of the words without recognizing that persons voice. When you read something, does your mind recognize the voice of the writer? Speech is the part of voice that contains meaning. It is the part that gets ham radio operators DX contacts. Speech can be compressed in bandwidth because it is made of phonemes that are quantized into an integer  number of parts (45).

Since the phoneme comparator at the transmit sequence used the voice of the sender, it will also use the pitch of the sender. Since the phoneme length is variable in 8 mS step size, this software will take care of timing without exceeding the 125 Hz bandwidth. Our radios have automatic volume circuits (speech processors) to make them have a constant volume, which means, volume is not as important as speech. Again in common terms we can understand the words of an old man or a little girl even though the voices are very different.

The software actually uses the pitch of the senders voice when the transmit section of the software learns the senders phonemes. When the receive section plays one of the twelve sets of phonemes, that will include the pitch of the twelve people who made the audio clips of the phoneme library.
Channel spacing
As ham radio operators, we have clear channels now that the sun spot cycle is at its minimum. When the sun spot cycle is at its maximum, there won't be many clear channels. With a 19.2 times reduction of bandwidth over SSB voice, there will be many more clear channels.
Why has this not been done before?
Unfortunately this project was not invented by one of the "big guns" in the digital voice and speech recognition software group. "Not invented here" applies.

Timing is critical for this to work and that part of software is not what is usually taught in schools.br id="v:sd">
The old computers are no match for speed and amount of memory of today's computers. But people only remember the sound of the "Speak-n-Spell".

IIn order for this project to work throughout the world, the software must be free. There is no hardware required beyond that used for PSK-31. The software cannot be patented because my paper has been published on the internet. There is no way to make money from doing this. People who work in the digital voice and speech recognition business could lose their livelihoods, if this project was to succeed.
What would it sound like if the voice of the person doing the sending
was also used to make up the phoneme library for receiving?
I was hoping to be one of those twelve people. It makes me sad, disappointed and angry to know there was nothing else I could do to answer this question.
Conclusion
Without someone to do it, this project will end at 2008.

A prequel to digital speech within 125 Hz bandwidth


Phase Shift Keying (/font>PSK) is composed of two parts, Bi-Phase Shift Keying (BPSK) and Quadrature Phase Shift Keying (QPSK). This is a narrative on how BPSK-31 and QPSK-125 works. It also explains some of the properties of voice and how they relate to digital speech.

In the beginning of radio the carrier was modulated by turning it on and off, On-Off Keying (OOK). This became known as CW, when used with Morris code. Phone was added when the modulation was changed to AM. This form of modulation has a carrier and a upper and a lower sidebands. An improvement was made when one of the sidebands and the carrier were remove, which allowed better use of the radio spectrum and better efficiency of the transmitted power. This signal sideband (SSB) modulation is what we use in our radios today for point to point communication. By connecting our radios to computers, other forms of digital modulation can be created.

The details of how a radio is connected to a computer are complex and not covered here, but it happens somehow.

Inside a computer are numbers. A program could make numbers that represent a digital audio sin wave and send these numbers at a sample rate to the hardware of the audio sound card, where they are converted into a AC voltage. The part of the sound card that does this is the Digital to Analog converter (D/A). Then this AC voltage is used to modulate the transmitter part of the radio, which creates a constant carrier and that is sent to the antenna. If the program was modified to only make number when a key on the keyboard was pressed, lets say the notorious "any key", and the operator was to press the key using Morris code, a CW transmission could be sent. This is silly to use a computer and keyboard as a telegraph key, but is show the point that the computer could modulate the audio which in turn modulates the radio.

Two things are needed to turn this silly program into BPSK-31. First each key on the keyboard needs to be given a number or code. This is called the Varicode. The rules for this code are that the first digit must start with a one and the last two digits must be zeros. There 128 codes used in this computer keyboard transmission. They include the alphabet, both upper and lower case, numbers, punctuation, special characters and typing operators like carriage returns and line feeds. The number of digits in each code ranges from three to fourteen, with the most used keys given the smaller number of digits, just like Morris code. NOTE: Some of you use the Caps Lock key while typing. In today's text messaging world, this is called shouting or yelling. Capitol letters have more digits than lower case letters and take more time to transmit. As each key is pressed on the keyboard, its Varicode is sent to a buffer for storage and the ones and zeros are then sent to the phase shift modulator serially at a 31.25 Hz rate. Why pick 31.25 Hz? It turns out that there is a 8,000 Hz oscillator in each sound card. No matter what kind of sound card or what kind of computer or what operating system is used, they all have this 8,000 Hz oscillator. 8,000 Hz divided by 256 is 31.25 Hz. 31.25 Hz is the rate that each of the digit in the buffer is removed.

The other thing needed to turn this into BPSK-31 is to reverse the phase of the audio made by the numbers in the computer. This is easy to do by negating each of the numbers sent to the audio sound card when a one is removed from the buffer and not negating each of the numbers when a zero is removed from the buffer. But there is big problem. When there is an abrupt transition in the voltage, distortion accrues in the form of harmonics. The only number that does not make harmonic distortion when negated is zero. So the values of the numbers are slowly reduced to zero at the transition from a one to a zero or from a zero to a one, and slowly enlarged to full value after the transitions. A formula has been developed to adjust the slops of the reduction and enlargement so that minimum distortion happens.

Now lets take a look at the receive part of BPSK-31. Again the details of how a radio is connected to a computer are complex and not covered here, but it happens somehow.

The sound from the radio enters the computer through the sound card. The voltage of the sound goes to a Analog to Digital converter (A/D), where numbers are made at the sound card sample rate. These number go to three pieces of test equipment and the rest of the program.

The first piece of test equipment is the audio spectrum analyzer. The horizontal scale of this graph shows the frequency with a full range of about 100 Hz to 3,000 Hz on a liner scale. The vertical scale of this graph shows the amplitude at that frequency on a logarithmic scale. The noise seen on the display rolls off at about 300 Hz and about 2,400 Hz. These frequencies are made from the hardware filters in the radio. A strong signal could be seen as vertical lines above the noise. The computer mouse could move the cursor to the signal and the scale could be changed to zoom in on that signal. Now a very important thing happens. As the bandwidth is reduce by zooming in, the noise level is reduced. This is because only the noise within the new bandwidth is seen. This is always less the 100% of the noise. The closer you zoom in the lower the noise. If you zoom into a bandwidth of 50 Hz, over 98% of the noise will be reduced. A signal, which was previously below the noise, now could be much larger than the noise, because the signal strength does not change when the bandwidth is reduced. This is why BPSK-31 is so good. The spectrum analyzer is not a good tool for finding signals below the noise floor.

The next piece of test equipment is the waterfall. The horizontal scale is the full frequency scale of the spectrum analyzer. The vertical scale shows a straight line. The intensity at each point of the straight line is the average peak amplitude at that frequency. The peak amplitude is the signal plus the noise, which is always larger the noise even if the signal is less then the noise. After the averaging time of about one second, the line is frozen and lowered down vertically. Then a new line is made. This process is continually repeated. It looks like a waterfall as the lines move down the screen. When all the lines are looked at together, the peak intensities form vertical lines. This is a good piece of test equipment to see signals that are below the noise floor. Again the computer mouse could move the cursor to a vertical line on the waterfall. The courser is actually the center of a narrow band digital software filter that the PSK program uses to pass the PSK signal and reject all other noise and signals. The cursor is also the frequency of the audio sin wave made by the computer generated numbers.

The last piece of test equipment is a vector scope (phase scope) It shows the phase of the detected signal. In order to measure the phase of two signals, they both must be on the same frequency. The PSK program has a built in Automatic Frequency Control (AFC) that puts the phase detector at the same audio frequency as the received signal. This test equipment allows us to see if the signal is BPSK or QPSK.

The phase detector creates a zero when the phases are equal and a one when the phases are opposite. These ones and zeros are used by the PSK program to find the Varicode, which is used to display the  received message characters on the monitor. It is important for the computer to check the phase in the middle of the digit not during the transition. This is done with a second AFC at 31.25 Hz that uses a zero crossing detector and a delay.

When a new transmission starts, a series of ones and zeros are transmitted to synchronize the 31.25 Hz receive clock and release the receiver squelch. The series of ones and zeros are used because they have the maximum amount of zero crossings. At the end of all transmissions a special code is sent the receiver to set squelch, which stops the displaying random characters made by noise. When the transmitter buffer is empty the series of ones and zeros are sent until the next key is pushed and a Varicode is put into the transmit buffer.

QPSK-31 is the same as BPSK-31, but there are four phases that are 90° apart. The phase 0° could be used for Varicode digits 00. The phase 90° could be used for Varicode digits 01. The phase 180° could be used for Varicode digits 11. The phase 270° could be used for Varicode digits 10. This would double the transfer rate, but this isn't what is done. Instead and error correcting algorithm is used to fix errors in the received code. I won't try to explain how it works, but the net result is that when a digit is detected, its value is checked against the previous four values and a final decision is made, if this digit is a one or a zero. The result is that one out of five digits could be wrong and the received code will fix itself to 100% copy and the transfer rate is the original 31.25 Hz.

To get from QPSK-31 to QPSK-125 the 8,000 Hz oscillator in the sound card is divided by 64 to get 125 Hz and the received software bandpass filter is increased from 50 Hz to 150 Hz.

Voice has at least four properties, volume, timing, speech and pitch. Our radios have processors that try to keep a constant volume so my project doesn't transmit the volume. Timing is very important and needs to be transmitted. We are constantly changing the speed at which we talk. Timing is not accounted is speech recognition software. Pitch is not need to be transmitted to understand speech, but is is essential to recognize the voice of the operator. A complete branch of applied science is devoted to speech. Unfortunately they don't have their act together in that some of them say there are 44 fundamental blocks of speech called phonemes and others say there are only 40 phonemes. I know that the most important phoneme is no-sound. The word "at" could not be said without a no-sound between the "a" and the "t". 

 

It is obvious that what is needed to be done is to assign a Varicode number to each phoneme. The shortest phoneme should be given the Varicode number with the shortest digits. For example the Varicode 100 is given to no-sound. There are only 45 of the 126 Varicode numbers used and maximum length is ten digits. The worse case is 12.5 phonemes per second, but because most phonemes have a small number of digits, the average worse case is about 16 phonemes per second. I am not sure how many phonemes per second we use in are speech, but I think that should be good enough for slow talking.

My paper "Digital Speech within 125 Hz bandwidth" may or may not work. Unless someone with the software expertise tries my idea, we will never know if my digital speech is better than SS